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(57) Abstract 



A method of representing a colour image comprises selecting a region of the image, selecting one or more colours as representative 
colours for the region, and. for a region having two or more representative colours, calculating for each representative colour at least two 
parameters related to the colour distribution in relation to the respective representative colour and using said parameters to derive descriptors 
for the image region. 
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Method and Apparatus for Representing and Searching for Colour Images 



The present invention relates to a method and apparatus for 
representing a colour image or a region of an image for searching purposes, 
and a method and apparatus for searching for colour images or image regions. 

Searching techniques based on im^e content for retrieving stiU 

" — 

iniages^;andi^i^ide are known. 

Various image.features, including colour, texture, edge information* shape and • 

motion, have>been->l^sed^for"sucK:tccKaiqu^e^ Applications of such techniques 

include Internet search engines, interactive TV, telemedicine and 

teleshopping. 

For the piuposes of retrieval of images from an image database, 
images or regions of images are represented by descriptors, including 
descriptors based on colours within the image. Various different types of 
colour-based descriptors are known, including the average colour of an image 
region, statistical moments based on colour variation within an image region, 
a representative colour, such as the colour that covers the largest area of an 
image region, and colour histograms, where a histogram is derived for an 
image region by counting the number of pixels in the region of each of a set of 
predetermined colours. 

A known content-based image retrieval system is QBIC (query by 
image content) (see US 5579471, MPEG document M4582/P165: Colour 
Descriptors for MPEG-7 by IBM Almaden Research Center). In one of the 
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modes of operation of that system, each image in a database is divided into 

■ 

blocks. Each block is grouped into subsets of similar colours and the largest 
such subset is selected. The average colour of the selected subset is chosen as 
the representative colour of the respective block. The representative coloiu" 
5 information for the image is stored in the database. A-query in the database..^ 
can be made by selecting a query image. Representative colour information 
.for the query image is derived in the same manner as described above. The 
query information is then compared with^ the information for the: images . ,^ 
Stored in tlTe database' using to locate the^iclosest matches.; 

10 MPEG document M4582/P437 and US 5586197 disclose a similar 

approach, but using a more flexible method of dividing an image into blocks 
and a different method of comparing images. In another variation, described 
in MPEG document M4582/P576: Colour representation for visual objects, a 
single value for each of two representative colours per region are tised. 

IS Several techniques for representing images based on colour histograms 

have been developed, such as MPEG document M4582/P76: A colour 
descriptor for MPEG-7: Variable-Bin colour histogram. Other techm'ques use 
statistical descriptions of the coloiu- distribution in an image region. For 
example, MPEG document M4582/P549: Colour Descriptor by using picnire 

20 information measure of subregions in video sequences discloses a technique 
whereby an image is divided into high and low entropy regions and colour 
distribution features are calculated for each type of region. MPEG dociunent 
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M4852/P319: MPEG-7 Colour Descriptor Proposal describes using a mean 
and a covariance value as descriptors for an image region. 

All the approaches described above have important shortcomings. 
Some of them« in particular colour histogram techniques, are highly accurate, 

S but require relatively large amounts of storage and processing time. Other 
methods^ such as the ones using one or two representative colours, have high 
storage and computational efficiency but are not precise enough. The 
statistical descriptors are a compromise between those two types of 
techniques^ but they can suffer fix)m lack of flexibility, especially in case 

1 0 where colours of pixels vary widely within a region. 

The present invention provides a method of representing an image by 
approximating the colouj distribution using a number of component 
distributions, each corresponding to a representative colour in an image 
region, to derive descriptors of the image region. 

IS The invention also provides a method of searching for images using 

such descriptors. 

Thevihvientibh' also provides a computer program for implementing ^ 

■ -I ' ■ ' ' *" . ' • , . 

^ said methods and a computer-readable medium storing such a computer ^ 

program. The computer-readable medium may be a separable mediimn such 

20 as a floppy disc or CD-ROM or memory such as RAM. 

An embodiment of the invention will be described with reference to 

the accompanying drawings of which: 
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Fig. 1 is a block diagram of a system according to an embodiment of 
the invention; 

Fig. 2 is a flow chart of a first search method; and 

Fig. 3 is a flow chart of a second search method. 
S A system according to an embodiment of the invention is shown in 

Fig. 1. The system includes a control unit 2 such as a computer for 
controlling operation of the system, a display unit 4 such as a monitor, 
connected to the control unit 2, for displaying outputs including images and 
text and a pointing device 6 such as a mouse for inputting instructions to the 
10 control unit 2. Thejsystem also includes, an image database 8 storing digital 



versi6rislidf*;a?§plj^ity^^ datobase 10 storing 

descriptor information, described in more detail below, for each of the images 
stored in the image database 8. Each^of the image . database 8 and the ^ 
descriptor: database 10 is cbiinected to the cdntfoLimit^^^ The system also 
1 5 includes a search engine 1 2 which is a computer program under the control of 
the control unit 2 and which operates on the descriptor database 1 0. 

In this embodiment, the elements of the system are provided on a 
single site, such as an image library, where the components of the system are 
permanently linked. 

20 The descriptor database 1 0 stores descriptors of all the images stored 

in the image database. More specifically, in this embodiment,. the descriptor 
database 10 contains descriptors for each of a plurality of regions of each 
iinagej The descriptors are derived as described below. 
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Each. jinage.ih;? the> database. 8 is divided into> a number of non- 
overlapping rectangular blocks of pixels. For each block, a colour histogram 
is then derived, by selecting a predetermined number of colours, and counting 
the number of pixels in the block of each colour. 
5 The colour histogram so obtained shows the colour distribution of the 

pixels within the block. In general, the region will have one or more 
dominant colours, and the histogram will have peaks corresponding to those 
colours. 

The descriptors for the blocks are based on the dominant colours as 
10 identified from the histogram. The descriptor for each block has the following 
elements: 

(1) The number of dominant colours, n, called the degree of the 
descriptor, where n ^ 1 ; and 

for each dominant colour: 
IS (2)(a) a weight representing the relative significance of the respective 

dominant colour in the block. Here, the weight is a ratio of the number of 
pixels in the block of the relevant colour to the total number of pixels in the 
block. 



(b) a mean value, m = 



my 



20 where x, y and z index colour components, for example the red, green and 
blue colour components of the colour in RGB colour space. Here, the mean 
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value corresponds to the colour components of the respective dominant 
colour. 



(c) a covahance matrix C - 



c c c 

XX xy x2 

C C C 

yx yy n 

c^ c„J 



^ere C|i represents variance of colour component i and Cjj represents 



S covariance between components i and j. The covariance matrix is 
symmetrical (Cjj = Cji) so only six nimibers are needed to store it. 

In obtaining the descriptor as discussed above, the colour distribution 
is treated as n different sub-distributions, where n is the number of dominant 
colours, each sub-distribution centring about a respective dominant colour as 

10 the mean. The ranges of the sub-distributions may well overlap, and a 
suitable algorithm is used to determine the range of each distribution for 
calculating the weight, mean and covariance matrix, as will be understood by 
a person skilled in the art. One way of estimating the descriptor components 
is to fit Gaussian functions centred at histogram peaks to the histogram by 

IS minimising the difference between the actual histogram counts and values 
estimated from the~ mixture of Gaussian ftmctioi^^ - - . _ 

The descriptor database 10 stores a descriptor as defmed above for 
each block of each image stored in the image database 8. The representation 
of the colour distribution within each block using the descriptor structure 

20 described above contains a large amount of descriptive information, but 
requires less storage space than, for example, full histogram information. 
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As an example, a colour histogram for a specific block may exhibit 
three peaks corresponding to three dominant colours. The histogram colour 
distribution is analysed as three colour sub-distributions and results in a 
descriptor including the number three indicating the number of dominant 
S colours, three weights, three mean vectors, corresponding to the colour 
vectors for the three peaks, and three corresponding covariance matrices. 

The system is used to search for images in the image database using 
the descriptors stored in the descriptor database. The present embodiment 
provides two search methods: a single colour based search and a region based 
10 search. 

The single colour based search will be described with reference to the 
flowchart shown in Fig. 2. 

In the single colour based search, the user inputs a query by selecting a 
colour to be searched, using the pointing device 6 and a menu such as a colour 
IS wheel or a palette displayed on the display unit 4 (step 102). The control imit 
2 then obtains the corresponding colour vector for the query colour, the colour 

9 

vector having components which are the respective colour components for the 
query colour, that is, the red, green and blue components (step 104). 

The control unit 2 then uses the search engine 12 to search for images 
20 in the image database 8 that include the query colour. The search engine 12 
performs a matching procedure using the query colour vector and the 
descriptors for the image blocks in the descriptor database 1 0 (step 1 06). 
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The matching procedure is performed using the following formula for 
calculating a matching value M. 



M= exp - ^ (q - in)^C'' (q - m)j 



where q is the query colour vector. A matching value is calculated for each 
S dominant colour in each block using each value of m and C in the descriptor 
for the block. Thus* for a descriptor of degree n, n matching values are 
obtained. 

The matching value can be considered as the value of the probability 
density function corresponding to each colour sub-distribution in the block at 
10 the point defined by the query colour value, modelling the probability density 
function as a Gaussian function. 

For a given descriptor^ the larger a matching value M, the closer the 
corresponding block is to a match with the selected colour. 

When matching values have been calculated for each descriptor in the 
IS descriptor database 10, the search engine 12 orders the results by the size of M 
starting with the largest values of considering only the largest value of M 
for any descriptors of degree greater than one (step 108). 

The control unit 2 takes the results of the matching procedure from the 
search engine 12, and retrieves from the image database a predetermined 
20 nimiber K of those images which are the closest matches, corresponding to the 
K highest values of M. Those images are then displayed on the display unit 4 
(step 1 1 0). The set-up of the control unit 2 determines how many of the 
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closest matches are to be displayed on the display uniL That number can be 
changed by the user. 

As will be understood from the above description, the single colour 
based search retrieves images from the image database 8 which have a block 
S which has a dominant colour which is the same as or close to the colour 
initially selected by the user. 

The region based search will be described with reference to the 
flowchart shown in Fig. 3. 

In the region based search, the control imit 2 operates to display a 
10 predetennined set of search images, which are images from the image 
database 8, on the display tmit 4 (step 202). The search images may be 
wholly determined by the set-up of the control imit, or may depend on other 
requirements input by the user. For example, in a larger system supporting 
keyword-based searches the user might input the word "'leaves'* which would 
1 S result in a predetermined set of images depicting leaves being shown as the 
images for colour based search. 

Each of the search images is shovm with a grid dividing the image into 
blocks, corresponding to blocks for which the descriptors have been derived. 
The user then selects, using the pointing device 6, a block on one of the 
20 images which shows a colour distribution of interest (step 204). 

The control tmit 2 then retrieves the descriptor for the selected image 
block from the descriptor database 1 0 and uses it as a quer>' descriptor (step 
206). The descriptor is already available because the search images are taken 
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from the image database 8. The search engine then performs a search 
comparing that query descriptor with the other descriptors stored in the 
descriptor database using matching functions (step 208). 

For a query descriptor having a mean value and covariance matrix 
5 Ca for one of the dominant colours and another descriptor having a mean 
value and covariance matrix C5 for one of the dominant colours, a 
matching function is defmed as: 

m,(a,b)= Jexpj^-~(q-mj''C;'(q-mjjexp|^-|(q-mJ^Cj(q-mjj^^ 

where q is a 3-d vector akin to a colour vector and where the integral is 
10 calculated over the range from (0, 0, 0) to (2SS, 255, 255) where 2S3 is the 
maximum value of a colour component. The range of the integral in other 
embodiments will depend upon the colour co-ordinate system and 
representation used. 

This is equivalent to modelling the corresponding coloiu* sub- 
15 distributions for the image blocks as probability mass functions in the form of 
Gaussian functions, and determining the degree to which they overlap, or in 
other words determining the similarity between them. The larger the result of 
the above calculation, the closer are the corresponding colour distributions. In 
this case, the function determines the degree to which a colour sub- 
20 distribution in the query image block and a colour sub-distribution in a stored 
image overlap. 



10 



wo 00/67203 PCT/GBOO/01667 

The full matching function for matching one descriptor mrh another is 
defined as: 

where v and w are weights for sub-distributions, and the stimmation is over all 
S sub-distributions in both regions. 

Thus, for each dominant colour described in the descriptor of a query 
image block, a matching value is calculated with respect to each dominant 
colour in a descriptor from the descriptor database 10. The resulting matching 
values are weighted and then sununed to give -a final matching value 
1 0 corresponding to mf. 

Full matching values are calculated as described above for all 
descriptors in the database with respect to the query descriptor. As in the 
single colour based search, the results are ordered (step 210), and the K 
images with the highest matching values, indicating the closest matches, are 
15 displayed on the display unit for the user (step 212). 

A further iteration of a search can be performed by selecting an image 
region in an image foimd in the previous search. 

Matching can be done using other similarity measures than those 
described above. A further example is given below. 
20 For a pair of descriptors, F| and F2, for two regions, a similarity 

measure D is defined as follows: 

D{Ft,F^) = 2LzLPuP\jfmj ^zLzLPiiPijfiiiJ -Y,2l-PuPzjfii2j^ 
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where 



( 

c c 

-T 

V V 
\ snji xiyju 



\ 




exp 




and 




Here, i and j index the representative colours; 
.X and y index the descriptors; 

Ni is the number of representative colours in the first descriptor; 
N2 is the number of representative colours in the second descriptor; 
Pii is the ith weight in the first descriptor, 
P2j is the jth weight in the second descriptor; 

I, u and V represent colour components, such as red, green and blue 
colour components in this specific example; and 

c and V are the dominant colour values (mean values) and colour 
variances respectively, so c^n is the 1th component of the ith representative 
colour value of the xth descriptor, and v^a is the 1th component of the variance 
of the ith representative colour of the xth descriptor etc. 

In contrast with the matching functions described previously, for 
descriptors F| and F2, the smaller the value of D, the closer is the match 
between the regions corresponding to the descriptors Fj and Fj. Accordingly, 
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the values D resulting from a search procedure such as described above are 
ordered in increasing size starting with the smallest value of D. Otherwise, 
the searching and matching procedure can be carried out substantially as 
described above, with appropriate modifications to take accoxmt of the 
different similarity measure. It will be noted that this similarity measure uses 
the variances, and not the covariance matrix. Thus, the descriptor for a region 
includes the variances but does not need the covariance matrix. Accordingly, 
the storage requirement is reduced compared with the descriptor described 
previously. 

A system according to the invention may, for example, be provided in 
an image library. Alternatively, the databases may be sited remote from the 
control unit of the system, connected to the control imit by a temporary link 
such as a telephone line or by a network such as the Internet. The image and 
descriptor databases may be provided, for example, in pennanent storage or 
on portable data storage media such as CD-ROMs or DVDs. 

In the above description, the colour representations have been 
described in terms of red, green and blue colour components. Of course, other 
representations can be used, such as a representation using a hue, saturation 
and intensity, or YUV co-ordinate system, or a subset of colour components 
in any colour space, for example only hue and saturation in HSI. 

The embodiment of the invention described above uses descriptors 
derived for rectangular blocks of images. Other sub-regions of the image 
could be used as th? basis for the descriptors. For example, regions of 

13 



wo 00/67203 PCT/GBOO/01667 

difTerent shapes and sizes could be used. Alternatively, descriptors may be 
derived for regions of the image corresponding to objects, for example, a car, 
a house or a person. In either case, descriptors may be derived for all of the 
image or only part of it. 
S In the search procedure, instead of inputting a simple colour query or 

selecting an image block, the user can, for example, usesdie pointing^deyice to 
describe a region of an image, say, by encircling it, whereupon the control 

■ 

imit'derives a descriptor for-? that region and uses-it^forA searching^ ima: sinula^^ 
^inanher as described «above. Also, instead of using images already stored in 
10 the image database for initiating a search, an image could be input into the 
system using, for example, an image scaimer or a digital camera. In order to 
perform a search in such a situation, again the system first derives descriptors 
for the image or regions of the image, either automatically or as determined 
by the usen 

15 Appropriate aspects of the invention can be implemented using 

haidwaf e bvsoftwaiev^^^,A: 

In the above embodiments, the component sub-distributions for each 

representative colour are approximated using Gaussian? fuhctionsi^tiand^the 

mean and coyariances of those fimctions are, used as descriptor, values. 
20 Hpvyever, other functions or parameters can be .usedv. to japprox^mate.^^^^^ 

component distributions,. forrexampleivUsing>^ and 

cosine, with descriptors based on those flmctions.>> .* 
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CLAIMS 

1. A method of representing a colour image comprising selecting a 
region of the image, selecting one or more colours as representative colours 
5 for the region and, for a region having two or more representative colours, 
calculating for each representative colour at least two parameters related to the 
colour distribution in relation to the respective representative coloiu* and using 
said parameters to derive descriptors for the image region. 

10 2. A method as claimed in claim 1 wherein the parameters are 

statistical values related to the coloiu- distribution in relation to the respective 
representative colour in the region. 

3. A method as claimed in claim 1 or claim 2 comprising storing 
IS said descriptors in data storage means. 

4. A method as claimed in any one of claims 1 to 3 wherein the 
step of selecting representative colours comprises deriving a colour histogram 
for the region. 

20 

5. A method as claimed in claim 4 wherein the step of selecting 
representative colours comprises identifying local peaks in the colour 
histogram and selecting the corresponding colours as representative colours. 
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6. A method as claimed in claim 3 dependent on claim 2 wherein 
the local peaks are treated as mean values for colour distributions within the 
region and said statistical values are the first two central moments of the 

S respective distribution. 

7. A method as claimed in any one of claims 1 to 6 wherein the 
image region is independent of the image content. 

10 8. A method as claimed in claim 7 wherein the image region is a 

polygon. 

9. A method as claimed in any one of claims 1 to 6 wherein the 
image region corresponds to an object. 

15 

10. A method of representing a colour image by processing signals 
corresponding to said image» the method comprising selecting a region of the 
image, identifying a number of representative colours for the region, and, for a 
region having two or more representative colours, deriving a function 

20 approximating the colour distribution corresponding to each representative 
colour, and using said functions to define colour descriptors of said region. 
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II. A method of representing a colour image by processing signals 
corresponding to said image, the method comprising selecting a region of the 
image, identifying a number of representative colours for the region, and, for a 
region having more than one representative colour, deriving for each 
S representative colour an indication of the spread of the colour distribution in 
relation to the representative colour and using said indication to derive 
descriptors for the image region. 



12. A method of searching for colour images stored in data storage 
10 means comprising inputting a query relating to colour of an image, comparing 

said query with descriptors for stored images derived in accordance with a 
method as claimed in any one of claims 1 to 1 1 using a matching function and 
selecting and displaying at least one image for which the matching function 
indicates a close match between the query and at least part of the image. 

15 

13. A method as claimed in claim 12 wherein inputting a query 
comprises selecting a query image region and obtaining descriptors for said 
image region in accordance with a method as claimed in any one of claims 1 
to 1 1 and wherein the matching function uses the descriptors for the query and 

20 for the stored images. 
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14. A method as claimed in claim 12 or claim 13 wherein the 

matching function is based on M = exp ^ " (q - m)j 

where q is a colour vector corresponding to a query and m and C are 
descriptor values representing first and second central moments of the colour 
5 distribution for a representative colour* 



IS. A method as claimed in claim 12 or claim 13 wherein the 



matching function is based on 



10 where 



(a,b) = |exp|^--^(q-m C;* (q-m.)jexpj^-^(q-m j'^C'J (q-m jjdq 

and where m and C are descriptor values representing first and second central 
moments for colour distributions for representative colours. 
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16. A method as claimed in claim 12 or claim 13 wherein the 



matching function is based on 



AT, AT, 



ft I f»| N2 f»j 

^iPuf2) = 2lzlPvPtjfinj +ZZ^2'^Jy/2/2y -SS^^W^JjAjy 

j'al Ja\ 



.V, 

Li 



where 



exp 



V V V 
xijifi xhjy Jtnjv 



m V V V # 

\ Mtyjl sryju xix/w / 



/2 
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and 



^SKjt -(^xi/ "^yil) *^^hjl "(^xi/ 



where 



D is the similarity measure; 
F| represents the first descriptor; 
F2 represents the second descriptor, 
i and j index the representative colours; 
X and y index the descriptors; 

N I is the number of representative colours in the first descriptor; 
N2 is the number of representative colours in the second descriptor; 
P)i is the ith weight in the first descriptor; 
P2j is the jth weight in the second descriptor; 
1, u and V represent colour components; and 

c and V are the dominant colour values (mean values) and variances 



respectively. 



17. A method as claimed in any one of claims 12 to 16 wherein a 
query is selected from a plurality of images displayed on display means. 
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18. A method as claimed in any one of claims 12 to 17 wherein 
inputting a query comprises selecting a single colour value. 



19. A method as claimed in any one of claims 12 to 17 wherein 
inputting a query comprises specifying one or more component distributions. 



20. A method as claimed in any one of claims 12 to 17 wherein a 
query is input using only some of the components of the colour space. 

10 21. An apparatus for implementing a method according to any one 

of claims 1 to 20. 

22. A computer system programmed to operate according to a 
method as claimed in any one of claims 1 to 20. 

15 

23. A computer program for implementing a method as claimed in 
any one of claims 1 to 20. 

24. A computer-readable medium storing computer-executable 
20 process steps for implementing a method as claimed in any one of claims 1 to 

20. 
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25. A method for searching for colour images by processing 
signals substantially as hereinbefore described with reference to the 
accompanying drawings. 

26. A computer system substantially as hereinbefore described 
with reference to the accompanying drawings. 
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(57) Abstract 



A method of representing a colour image comprises selecting a region of the image, selecting one or more colours as representative 
colours for the region, and, for a region having two or more representative colours, calculating for each representative colour at least two 
parameters related to the colour distribution in relation to the respective representative colour and using said parameters to derive descriptors 
for the image region. 



